Efficient pre-processing in the parallel block-Jacobi SVD algorithm
نویسندگان
چکیده
One way, how to speed up the computation of the singular value decomposition of a given matrix A ∈ C, m ≥ n, by the parallel two-sided block-Jacobi method, consists of applying some pre-processing steps that would concentrate the Frobenius norm near the diagonal. Such a concentration should hopefully lead to fewer outer parallel iteration steps needed for the convergence of the entire algorithm. It is shown experimentally, that the QR factorization with the complete column pivoting, optionally followed by the LQ factorization of the R-factor, can lead to a substantial decrease of the number of outer parallel iteration steps, whereby the details depend on the condition number and on the distribution of singular values including their multiplicity. A subset of ill-conditioned matrices has been identified, for which the dynamic ordering becomes inefficient. Best results in numerical experiments performed on the cluster of personal computers were achieved for well-conditioned matrices with a multiple minimal singular value, where the number of parallel iteration steps was reduced by two orders of magnitude. However, the gain in speed, as measured by the total parallel execution time, depends decisively on the implementation of the distributed QR and LQ factorizations on a given parallel architecture. In general, the reduction of the total parallel execution time up to one order of magnitude has been achieved.
منابع مشابه
On Data Layout in the Parallel Block-jacobi Svd Algorithm with Pre–processing
An efficient version of the parallel two-sided block-Jacobi algorithm for the singular value decomposition of an m × n matrix A includes the pre-processing step, which consists of the QR factorization of A with column pivoting followed by the optional LQ factorization of the Rfactor. Then the iterative two-sided block-Jacobi algorithm is applied in parallel to the R-factor (or L-factor). Having...
متن کاملParallel One-Sided Block Jacobi SVD Algorithm: II. Implementation
This technical report is devoted to the description of implementation details of the accelerated parallel one-sided block Jacobi SVD algorithm, whose analysis and design was described in [21]. We provide discuss a suitable data layout for a parallel implementation of the algorithm on a parallel computer with distributed memory. This discussion is complicated by the fact that different computati...
متن کاملDynamic Ordering for the Parallel One-sided Block-jacobi Svd Algorithm
The serial Jacobi algorithm (either one-sided or two-sided) for the computation of a singular value decomposition (SVD) of a general matrix has excellent numerical properties and parallelization potential, but it is considered to be the slowest method for computing the SVD. Even its parallelization with some parallel cyclic (static) ordering of subproblems does not lead to much improvement when...
متن کاملParallel Code for One-sided Jacobi-Method
One sided block Jacobi algorithm for the singular value decomposition (SVD) of matrix can be a method of choice to compute SVD efficiently and accurately in parallel. A given matrix is logically partitioned into block columns and is subjected to an iteration process. In each iteration step, for given two block columns, their Gram matrix is generated, its symmetric eigenvalue decomposition (EVD)...
متن کاملPreconditioned Parallel Block-jacobi Svd Algorithm
We show experimentally, that the QR factorization with the complete column pivoting, optionally followed by the LQ factorization of the Rfactor, can lead to a substantial decrease of the number of outer parallel iteration steps in the parallel block-Jacobi SVD algorithm, whereby the details depend on the condition number and on the shape of spectrum, including the multiplicity of singular value...
متن کاملPreconditioning in the Parallel Block - Jacobi Svd Algorithm ∗
One way, how to speed up the computation of the singular value decomposition of a given matrix A ∈ Cm×n, m ≥ n, by the parallel two-sided block-Jacobi method, consists of applying some pre-processing steps that would concentrate the Frobenius norm near the diagonal. Such a concentration should hopefully lead to fewer outer parallel iteration steps needed for the convergence of the entire algori...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Parallel Computing
دوره 32 شماره
صفحات -
تاریخ انتشار 2006